Protein Remote Homology Detection Based on Binary Profiles

نویسندگان

  • Qiwen Dong
  • Lei Lin
  • Xiaolong Wang
چکیده

Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. Such binary profiles make up of a new building block for protein sequences. The protein sequences are mapped into high-dimensional vectors by the occurrence times of each binary profile. The resulting vectors are then evaluated by support vector machine to train classifiers that are then used to classify the test protein sequences. The method is further improved by applying an efficient feature extraction algorithm from natural language processing, namely, the latent semantic analysis model. Testing on the SCOP 1.53 database shows that the method based on binary profiles outperforms those based on many other basic building blocks including N-grams, patters and motifs. The ROC50 score is 0.698, which is higher than other methods by nearly 10 percent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Remote Homology Detection and Fold Recognition based on Features Extracted from Frequency Profiles

Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. The performance of SVM depends on the method of protein vectorization, so a suitable representation of the protein sequence is a key step for the SVM-based me...

متن کامل

Profile-based direct kernels for remote homology detection and fold recognition

MOTIVATION Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS We...

متن کامل

A Discriminative Method for Protein Remote Homology Detection Based on N-nary Profiles

Protein homology detection is a key problem in computational biology. In this paper, a novel building block for protein called N-nary profile which contains the evolutionary information of protein sequence frequency profiles has been presented. The protein sequence frequency profiles calculated from the multiple sequence alignments outputted by PSI-BLAST are converted into N-nary profiles. Such...

متن کامل

Support vector machines with profile-based kernels for remote protein homology detection.

Two new techniques for remote protein homology detection particulary suited for sparse data are introduced. These methods are based on position specific scoring matrices or profiles and use a support vector machine (SVM) for discrimination. The performance on standard benchmarks outperforms previous non-discriminative techniques and is comparable to that of other SVM-based methods while giving ...

متن کامل

Protein Remote Homology Detection by Combining Profile-based Protein Representation with Local Alignment Kernel

Protein remote homology detection has attracted a great deal of interest as it is one of the most important problems in bioinformatics. Profile-based methods recently achieve the state-of-the-art performance. A key step to improve the performance of these methods is to find a suitable approach to use the evolutionary information in the profiles. In this study, we propose the profile-based prote...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007